Optimized data-independent acquisition approach for proteomic analysis at single-cell level

Background Single-cell proteomic analysis provides valuable insights into cellular heterogeneity allowing the characterization of the cellular microenvironment which is difficult to accomplish in bulk proteomic analysis. Currently, single-cell proteomic studies utilize data-dependent acquisition (DDA) mass spectrometry (MS) coupled with a TMT labelled carrier channel. Due to the extremely imbalanced MS signals among the carrier channel and other TMT reporter ions, the quantification is compromised. Thus, data-independent acquisition (DIA)-MS should be considered as an alternative approach towards single-cell proteomic study since it generates reproducible quantitative data. However, there are limited reports on the optimal workflow for DIA-MS-based single-cell analysis. Methods We report an optimized DIA workflow for single-cell proteomics using Orbitrap Lumos Tribrid instrument. We utilized a breast cancer cell line (MDA-MB-231) and induced drug resistant polyaneuploid cancer cells (PACCs) to evaluate our established workflow. Results We found that a short LC gradient was preferable for peptides extracted from single cell level with less than 2 ng sample amount. The total number of co-searching peptide precursors was also critical for protein and peptide identifications at nano- and sub-nano-gram levels. Post-translationally modified peptides could be identified from a nano-gram level of peptides. Using the optimized workflow, up to 1500 protein groups were identified from a single PACC corresponding to 0.2 ng of peptides. Furthermore, about 200 peptides with phosphorylation, acetylation, and ubiquitination were identified from global DIA analysis of 100 cisplatin resistant PACCs (20 ng). Finally, we used this optimized DIA approach to compare the whole proteome of MDA-MB-231 parental cells and induced PACCs at a single-cell level. We found the single-cell level comparison could reflect real protein expression changes and identify the protein copy number. Conclusions Our results demonstrate that the optimized DIA pipeline can serve as a reliable quantitative tool for single-cell as well as sub-nano-gram proteomic analysis. Supplementary Information The online version contains supplementary material available at 10.1186/s12014-022-09359-9.


Introduction
Cells from the same living organism have a similar genomic background, which are eventually differentiated into diverse cell types in different tissues or organs via the expression of different genes to proteins leading to cellular heterogeneity. Although the rapid development of genomic and transcriptomic methods made it possible to analyze genomic and transcriptomic alterations

Open Access
Clinical Proteomics *Correspondence: huizhang@jhu.edu 1 Department of Pathology, Johns Hopkins University, Baltimore, MD 21287, USA Full list of author information is available at the end of the article of cellular heterogeneity at single-cell level [1,2], the absence of protein amplification techniques hampered single-cell proteomic analysis. Originally, single-cell proteomic studies were limited to the detection of less than 15 targeted proteins from single mammalian cell by means of flow cytometry [3,4], mass cytometry [5], and single-cell western blotting [6]. After decades of development, mass spectrometry (MS), a primary tool for analyzing proteome and protein post-translational modifications (PTMs) from bulk samples, was applied to the first hypothesis-free mammalian single-cell proteomic analysis known as Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) [7] in 2018. ScoPE-MS utilizes isobaric tandem mass tags (TMT) to label peptides from single cells along with a carrier sample containing highly excessive number of peptides to increase the detection of peptide fragment ions, especially for low abundant peptides, during tandem mass spectrometry analysis (MS/ MS). Several works have followed the aforementioned concept and greatly expended the single-cell proteomics field [8]. However, TMT carrier-based methods have the limitations that the data quality and quantitation are highly dependent on the extremely imbalanced carrier ratio and instrument dynamic [9].
Data independent acquisition (DIA)-MS is considered as a consistent proteomic analytical method that allows the fragmentation of all the precursor ions within selected isolation m/z range generating comprehensive MS/MS spectra [10]. DIA-MS can provide reproducible global quantitative data with minimal cost [10][11][12][13]. Various software tools have been developed [14,15] to analyze DIA data that can be classified into spectral library-based approach [16] and library-free approach [17]. The spectral library-based DIA analysis is a peptidecentric method, which usually requires building spectral libraries either using corresponding DDA and/or DIA data from the same samples or using pre-built publicly available spectral libraries. However, the sample types and experimental conditions should be taken into consideration while building spectral libraries, especially when using external sources. Moreover, the spectral library size has direct impacts on DIA data search results [18], thus, an inappropriate library size would compromise the identification results [19,20]. On the other hand, the library-free DIA analysis is a spectrum-centric method. There are several tools to conduct library-free analysis, including DIA-Umpire [17] and directDIA embedded in Spectronaut [21]. Library-free approach detects or deconvolutes chromatographic features of precursorfragment ion groups to generate pseudo-MS/MS spectra, which allows multiple DIA raw files to be processed together. While generating pseudo-MS/MS spectra from one or more DIA raw files, an internal spectral library is constructed like the pre-built library from DDA data/ external sources. Therefore, the internal spectral library size could influent the DIA study. Nonetheless, libraryfree approach only relies on DIA data itself, thus, it is highly sample-specific compared to spectral librarybased approach, which is more suitable for single cell global proteome identification.
In this study, we evaluated the performances of DIA-MS approach for the analysis at the nano and sub-nanogram peptide levels using MDA-MB-231 cancer cells and drug resistant PACCs induced by platinum or docetaxel treatment [22] to optimize the DIA-MS workflow for single-cell level proteomics. PACCs are a large cancer cell state with high genome content that are induced by stress and treatment. We evaluated the DIA performances in different liquid chromatography (LC), MS/MS, and data analysis settings. We found that a 15-min short LC gradient and library-free approach via directDIA for data analysis allowed the identification of 3260 and 1530 proteins from 2 ng (corresponding to 10 PACCs) and 0.2 ng (corresponding to a single PACC) peptides with good reproducibility, respectively. Therefore, the results demonstrate that our optimized DIA pipeline can serve as a reliable quantitative tool for single-cell proteomic analysis.

Cell culture and cell counting
All cells were maintained in RPMI-1640 (Gibco), supplemented with 10% fetal bovine serum (FBS) and 1% penicillin streptococcus, and cultured in standard tissue culture conditions (37 °C, 5% CO 2 ). MDA-MB-231 was originally obtained from ATCC. Cell lines are routinely authenticated and tested for mycoplasma.
Parental MDA-MB-231 samples were prepared by seeding 1 × 10 6 cells and incubating for 24 h. Adherent cells were washed with PBS prior to being lifted with Cell Dissociation Buffer (Thermo Fisher Scientific). Lifted cells were re-suspended in 20 mL of culture medium and applied to a primed 15 µM pluriStrainer ® (The Cell Separation Company) and the flow-through retained. 1 × 10 6 cells were collected and washed twice in PBS. Following the final wash, supernatant was removed and the pellet was snap-frozen and stored at − 80 °C.
Drug-induced PACC samples were prepared by seeding 1 × 10 6 MDA-MB-231 cells. After 24 h incubation, cultures were treated with IC 50 cisplatin or docetaxel for 72 h. After 72 h, surviving adherent cells were washed with PBS and lifted with Cell Dissociation Buffer (Thermo Fisher Scientific). Lifted cells were re-suspended in 20 mL of culture medium and filtered through primed 15 µM pluriStrainer ® (The Cell Separation Company). The flow-through from each filter was discarded and the cells caught by the filter were harvested by flipping the filter upside-down and washing with 15 mL media. The PACC sample was pelleted at 1000×g for 5 min, counted, and washed twice in PBS. Following the final wash, supernatant was removed and the pellet was snap-frozen and stored at −80 °C.

Sample preparation
One million PACCs and parental MDA-MB-231 cancer cells (three samples from each cell type) were lysed in 60 µL lysis buffer containing 8 M urea as described in CPTAC protocol [23]. Briefly, cell lysates were centrifugated at 16,000×g for 12 min at 4 °C and protein concentrations were determined by Pierce ™ BCA protein assay (Thermo). The samples were reduced by 6 mM dithiothreitol for 1 h at 37 °C and then alkylated by 12 mM iodoacetamide for 45 min at room temperature in dark place. The samples were diluted to 2 M urea concentration with 50 mM Tris buffer (pH 8.0). In the 2 M urea buffer, the samples were digested with Lys-C (Wako) at 1 mAU: 10 mg enzyme to substrate ratio for 2 h at room temperature, followed by the addition of trypsin (Promega) at the same ratio for overnight digestion at room temperature. After the digestion, the mixtures were acidified by 50% formic acid to get 1% formic acid as final concentration with pH < 3. The digested peptides were desalted on C18 stage tips (3 M) and dried with Speed-Vac (Thermo). The dried peptides were redissolved in 3% acetonitrile with 0.1% formic acid and used NanoDrop ™ (Thermo) to determine the peptide concentration. Based on cell count and peptide yields (Additional file 1: Table S1), each PACC sample was 0.2 ng peptides per cell and each parental MDA-MB-231 sample was about 0.05 ng peptides per cell. Starting from 1 µg aliquoted peptides, a serial of dilution was performed to obtain 100 cells, 10 cells and 1 cell populations for two different sizes of cells corresponding to 20 ng, 2 ng, and 0.2 ng digested peptides for PACC, and 5 ng, 0.5 ng, and 0.05 ng digested peptides for parental cells. All the injections were spiked in 0.5 × iRT peptides (Biognosys) to calibrate the internal retention time.

NanoLC-MS/MS analysis
The aliquoted peptides equivalent to 100 cells, 10 cells, and a single cell of the PACCs and parental MDA-MB-231 cells were analyzed using two different LC gradient time, 15 min and 120 min. All the samples, from 1 µg to 0.05 ng of peptides, were separated by EASY-nLC ™ 1200 instrument (Thermo) with hand-packed analytical column (75 µm i.d. × 26.5 cm length packed with Repro-Sil-Pur 120 C18-AQ 1.9 µm beads) and Picofrit 10 µm opening (New Objective). The column was heated to 50 °C with Nanospray Flex ™ Ion Sources (Thermo). The elution flow rate was 200 nL/min with 0.1% formic acid in 97% H 2 O and 3% CH 3 CN as buffer A, and 0.1% formic acid in 90% CH 3 CN and 10% H 2 O as buffer B. Peptides were separated using 4-30% buffer B in 15 min gradient and 7-30% buffer B in 120 min gradient. All the samples were analyzed via Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific) and the parameters for the DIA method are as follows: resolution at 120,000, mass range of 350-1650 m/z, and maximum injection time of 60 ms for MS1 scan; resolution at 30,000, HCD collision energy of 34%, mass range of 300-1600 m/z, and maximum injection time of 80 ms for MS2 scan. For both MS1 and MS2, RF Lens 30% and normalized AGC Target 250% were applied. Total of 30 DIA raw files were acquired using two LC gradient settings from the aliquoted peptide samples of 100 cells, 10 cells and a single cell population of the PACC and the parental cancer cells (Additional file 1: Table S2).

DIA data analysis
The single-cell level DIA runs of global proteome were analyzed via library-free directDIA approach embedded in Spectronaut (version 14.10, Biognosys) with precursor and protein Qvalue cutoff at 1%. For the bulk sample analyses, five raw files (one from each of PACC samples and one from each of parental cell line samples) acquired from 1 µg injections were analyzed together in one directDIA search. For the single-cell analyses, the directDIA searches were conducted on five co-searching groups that each with different combination of raw files. Thus, each co-searching group generated an internal library containing different number of precursors (i.e., different library size). The five co-searching groups of single-cell analyses are as follows (Additional file 1: Table S3): one single-cell raw file (271 precursor from 0.05 ng peptides of a single MDA-MB-231 cell, referred to GS-1r_M; 2258 precursors from 0.2 ng peptides of a single PACC cell, referred to as GS-1r_P), 10 raw files of all the single-cell injections from two LC gradients (2687 precursors, referred to as GS-10r), 16 raw files from the combination of GS-10r and six injections (two LC gradients) from 0.5 ng peptides (10 cells) of MDA-MB-231 cells (5787 precursors, referred to as GS-16r), 20 raw files from the combination of GS-10r and ten-cell injections (two LC gradients) of PACC samples (16,496 precursors, referred to as GS-20r), and all 30 raw files (47,374 precursor, referred to as G-30r).
For the ten-cell analyses, the directDIA searches were conducted on five co-searching groups distinct from the co-searching groups of the single-cells as follows (Additional file 1: Table S4): one ten-cell raw file (3803 precursor from 0.5 ng peptides of a MDA-MB-231 cell sample, referred to as G10c-1r_M; 10,756 precursors from 2 ng peptides of a PACC sample, referred to as G10c-1r_P), 10 raw files of all the ten-cell injections from two LC gradients (17,118 precursors, referred to as G10c-10r), 16 raw files by combining G10c-10r and all 100-cell injections of MDA-MB-231 cells (26,191 precursors, referred to as G10c-16r), all 30 raw files (47,374 precursor, referred to as G-30r), and 25 raw files as a combination of G10c-10r, all 100-cell injections of PACC samples and five 1 µg peptide injections (88,012 precursors, referred to as G10c-25r).
The PTM analyses were conducted by searching global DIA data (nano-gram and single-cell levels) against the pre-built PTM spectral libraries.

PTM spectral library generation
To analyze PTMs, the spectral libraries for Fig. 4a were generated from patient-derived xenograft (PDX) samples, phosphopeptide spectral library was built using IMACenriched phosphorylation data of PDX samples (followed CPTAC standard protocol [24]), acetylation and ubiquitination spectral libraries were constructed using the data of antibody-enriched PDX samples [25,26]. All PTM spectral libraries from PDX samples were built by single DIA run, individually. Additionally, three phosphopeptide spectral libraries with different sizes (~ 42,000 precursors, ~ 87,000 precursors and ~ 142,000 precursors) were built using IMAC-enriched phosphorylation data of tumor tissues from clear cell renal cell carcinoma (ccRCC) tissues [24]. All the PTM libraries were built by using Pulser embedded in Spectronaut.

LC gradient time for protein identification at single-cell levels
In general, an optimal protein identification can be achieved using 1 µg peptides for DIA-MS analysis in combination with 120-min LC gradient for large cell population (≥ 20,000 cells) and bulk cell samples [27]. However, such approach may not be ideal for small cell population. Therefore, we conducted a comparative analysis on different LC gradient settings for peptides equivalent to hundred-cell, ten-cell, and single-cell levels. Of note, we used identical MS setting for both gradients. We first compared two LC gradient settings for a single cell (0.05 ng peptides), 10 cells (0.5 ng) and 100 cells (5 ng) from MDA-MB-231 cell samples by computing the identification ratio based on the average number of identified proteins from 15-min LC gradient to the average number of identified proteins from 120-min LC gradient from single run by direct DIA searches. We found the number of protein identification rate was more in 15-min LC gradient for single-cell and 10-cell levels. As shown in Fig. 1a, the protein identification ratios of 15 min LC gradient to 120-min LC gradient were more than 1 for single and ten MDA-MB-231 cells. We observed similar result for the single cisplatin treated PACC cell (0.2 ng of peptides) (Fig. 1b), where the ratio of 15-min to 120-min LC gradient was also more than 1, indicating that less proteins were identified from 120-min LC gradient and an improvement in overall protein identification using a short LC gradient time at single-cell level of peptide injection amount < 2 ng. Therefore, we chose 15 min as our optimal LC gradient setting for single-cell level DIA analysis.

Evaluation of global proteomic analysis at single-cell level
Besides investigating the suitable LC gradient for acquiring DIA-MS data at single-cell level, the search space of single-cell DIA data (i.e., the size of internal library generated during directDIA search) was also evaluated among the established co-searching groups. The DIA data of MDA-MB-231 cell samples (0.05 ng  (Fig. 2a). As the size of the internal library increased to 5787 precursors (i.e., GS-16r), we observed the highest peptide and protein coverage for the single MDA-MB-231 cancer cell. Total of 1093 peptide precursors and 406 protein groups were identified using GS-16r (Fig. 2a), corresponding to gains of 303% and 222% at peptide and protein levels compared to the results obtained by using GS-1r_M only. For cisplatin-treated PACC at single-cell level (0.2 ng of peptides), 621 protein groups were identified using the directDIA approach to search against the internal library of GS-1r_P (2258 peptide precursors) (Fig. 2b). Moreover, we found that using co-searching group of GS-20r (16,496 precursors) yielded the best identification, where 6153 peptide precursors and 1530 protein groups were identified (Fig. 2b). By using GS-20r, 172% and 146% gains at peptide precursor and protein levels relative to using GS-1r_P, respectively. Of note, the number of identified proteins/peptides was not necessarily increased as the search space expanded.
As shown in Fig. 2c, the best precursor identification is within the range of two-to four-fold difference between the total number of precursors in an internal library and total number of precursors detected in the sample of interest. Our results suggested that the internal library size was critical to protein identification at single-cell and sub-nano-gram levels via DIA-MS approach.
Furthermore, we evaluated the co-searching methods at ten-cell level. We identified 1816 protein groups (Fig. 2d) and 3260 protein groups (Fig. 2e) from 10 MDA-MB-231 cells and 10 PACCs, respectively. Similarly, the best peptide precursor and protein identifications were also fallen into the two-to four-fold changes between the internal library size and detected precursors (Fig. 2f ). In addition, we observed the optimal protein identification via directDIA search for the single-cell and ten-cell injections when co-searched with injected peptide amount that were about10-fold difference within the similar samples (Additional file 1: Table S5). Overall, it was essential to use a co-searching internal library generated from similar samples during the directDIA search to enhance Fig. 2 Evaluation of different co-searching groups (internal libraries with different numbers of precursors) generated during directDIA search for single-cell and ten-cell proteomic DIA analysis. a Numbers of peptides and proteins identified from different co-searching groups at the single MDA-MB-231 cell level. b Numbers of peptides and proteins identified from different co-searching groups at the single PACC level. c The single-cell level protein identification towards the ratio of total precursors comparing to single cell level precursor identifications with matched co-searching precursors. d Numbers of peptides and proteins identified from different co-searching groups at the ten MDA-MB-231 cell level. e Numbers of peptides and proteins identified from different co-searching groups at the ten PACC level. f The ten-cell level protein identification towards the ratio of total precursors comparing to ten-cell level precursor identifications with matched co-searching precursors protein identification at the single-cell and nano-gram levels.

Reproducibility on single-cell level proteomic analysis using DIA
After evaluation of LC gradient and size of internal library for DIA analysis at single-cell level, we investigated the inter-person reproducibility using our established workflow. Two sets of samples, each contained cisplatin-and docetaxel-treated PACC and parental MDA-MB-231 cancer cells, were prepared and analyzed a month apart by two researchers following the procedures stated in Experimental section. Here, we use the cisplatin treated PACC sample and one MDA-MB-231 sample to demonstrate the reproducibility of our DIA method. We observed pairwise Spearman correlation > 0.80 for the MDA-MB-231 cell sample between the two sets at singlecell, ten-cell, and hundred-cell levels (Fig. 3a, b, c). We found similar results for cisplatin treated PACC samples, where Spearman correlation ≥ 0.83 were observed in three different cell populations at single-cell, ten-cell and hundred-cell levels (Fig. 3d, e, f ). Taken together, these results demonstrated that our optimized DIA workflow provided robust quantitative global proteome profiling for single-cell DIA analysis, and larger cell population further improved the reproducibility.

PTM analyses for nano-gram levels of peptides without enrichment
Protein modifications are important for the regulation of various protein activities and cellular signaling events, and alternation in PTMs are associated with many diseases, including cancer [28]. When conducting MS-based PTM analysis, PTM enrichment is an essential procedure; however, there is very limited report of nano-gram/single-cell level of enrichment strategies for MS analysis. Thus, we established an alternative approach for PTM analysis at such level Fig. 3 Reproducibility of the DIA-MS runs where the sample preparations were done month apart by two different researchers. Pairwise Spearman correlation (ρ) of a, b, c protein intensities between two preparations and DIA-MS analyses of MDA-MB-231 cells at single-cell, ten-cell, hundred-cell levels, and d, e, f, protein intensities between two preparations and DIA-MS analyses of cisplatin treated PACCs single-cell, ten-cell, and hundred-cell levels by utilizing global proteomic DIA data and spectral libraries built from bulk samples. Unlike DDA-MS, DIA-MS allows that all the peptide precursors are co-fragmented within a selected m/z range to produce comprehensive MS2 spectra. The information of modified peptides should be retained in the global data even without PTM enrichment. Therefore, we were able to directly identify PTMs from the nanogram level (i.e., 100 cells) of global proteomic DIA data using customized PTM spectral libraries for phosphorylation, acetylation, and ubiquitination.
We firstly explored the possibility of identifying phosphorylation, acetylation, and ubiquitination from global DIA data of PDX samples at nano-gram level (Fig. 4a). We identified 72 phosphorylated peptides, 35 acetylated peptides, and 99 ubiquitinated peptides from 100 PACCs, indicating the possibility of finding PTMs without using enrichment.
We further evaluated the association between PTM spectral library size and PTM identification by examining the alteration in phosphopeptide identification rate from the nano-gram level of global DIA data, since a large collection of phosphopeptide-enriched DDA and DIA raw files from CPTAC study [24] allowed the construction of spectral libraries with various sizes ranging from ~ 42 to ~ 141 K precursors. As shown in Fig. 4b, among the three phosphopeptide spectral libraries, the library containing ~ 84 K precursors contributes to the highest identification number for 100 MDA-MB-231 cells (5 ng of peptides) and 100 PACCs (20 ng of peptides) of which 68 and 166 phopshopeptides with localized sites are identified, respectively. These results suggested that PTM analysis of nano-gram scale could be achieved by utilizing global DIA data along with a suitable PTM library built from bulk samples.

Application of single-cell level DIA approach to the drug resistant cancer cell study
To investigate whether the difference in cell size affected identification and protein expression patterns, we conducted a comparative analysis between PACCs (large cells) and MDA-MB-231 cells (smaller cells) at 1 µg peptide injection and single-cell level of peptide injection. We observed 98.5% of overlap in protein identification between cisplatin treated PACC and MDA-MB-231 samples (Fig. 5a), suggesting that they shared similar proteome profile, regardless of cell size. At single-cell level, we examined the identified protein groups from the cosearching via all single-cell raw files (i.e., GS-10r, Fig. 2a). We observed 388 protein groups identified in the single MDA-MB-231 cell and 688 proteins were identified from the cisplatin-treated PACC at single cell level (Fig. 5b).
Although more proteins were identified using 1 µg peptide injections, we speculated that single-cell proteomic analysis could capture the real protein expression changes comparing to bulk proteomic analysis, which would assist in the study of cellular heterogeneity between PACC and parental MDA-MB-231 cells. We compared the protein fold changes as shown in Fig. 5c. At single-cell level, we found majority of the proteins showing higher expression in cisplatin resistant PACC relative to the parental MDA-MB-231 cell with Log2 fold changes ranged between 2 and 5, indicating increaded protein levels for these proteins after treating MDA-MB-231 cells with cisplatin and transitioning to a PACC state, except a Single-cell level analysis may help to understand the mechanism of forming drug resistant PACCs after drug treating MDA-MD-231 cells. As lack of amplification of histone proteins during cell division is reported to lead to cell cycle elongation [29,30], drug-treatment of MDA-MB-231 cells inhibits amplification of histone proteins such as HIST1H4A and HIST1H1E are unable to undergo division, which results in PACCs that have much bigger size than untreated MDA-MB-231 cells. The protein p62/SQSTM1 is a selective autophagy receptor in a ubiquitylation-dependent approach [31,32]. Lack of amplification of p62/SQSTM1 protein might lead the cells insensitive to drug-associated stress and  [33,34]. We also observed the similar p62/SQSTM1, HIST1H4A, and HIST1H1E protein expression patterns from docetaxel resistant PACC. These findings in single cell level analysis reveal PACCs formation and survival mechanism.
The difference in protein abundance was further evaluated by comparing drug-induced PACCs performed in replicates to MDA-MB-231 cells without treatment performed in triplicates. The protein intensity ratios of the HIST1H1E protein from PACC samples to MDA-MB-213 samples were summarized in box plot, showing significantly statistical significance (ρ < 0.001) (Fig. 5d). Taken together, quantitative analysis of single-cell proteome via DIA approach can benefit our understanding of cellular heterogeneity and provide more accurate protein expression profiles which may be misinterpreted at bulk population.

Conclusions
Single-cell proteomic analysis provides insights into cellular heterogeneity allowing the characterization of cellular microenvironment, whereas proteomic analysis using bulk samples only captures a population average hindering our understanding of the diversity in cellular functions. In this study, we established and optimized a single-cell proteomic analysis workflow which utilized DIA-MS and direct-DIA method to analyze the global proteome of the MDA-MB-231 cancer cells and the matched drug resistant PACC cells. We first systematically evaluated aliquoted peptide samples from a single cell to 100 cell levels (0.05-20 ng) of the two cell types under different LC gradient settings. We found that 120-min LC gradient was more suitable for peptide injection amount > 2 ng (~ 10 PACCs), whereas 15-min LC gradient produced was more appropriate for the injection amount < 2 ng (~ 10 PACCs). By applying and investigating directDIA search method using different co-searching groups (i.e., internal libraries), we observed approximately a four-fold difference between the internal library size and total number of detected precursors of a DIA raw file produced the highest protein identification rate with good reproducibility [35]. Of note, 1500-3000 proteins were identified from 10 to 140 mammalian cells (equaled to 0.5-7 ng of peptides) by using narrow-bore columns (i.d. 30 µm or 20 µm) coupled with low flow rate separation [36], where < 2000 proteins were identified from 2 ng aliquoted peptide samples by using a special LC system [37,38]. Nevertheless, using our optimized workflow, 2 ng (~ 10 PACCs) of peptides allowed an identification of ≥ 3200 proteins and 1500 proteins were identified from a single PACC cell (~ 0.2 ng of peptides) even with normal bore column (i.d. 75 µm) and normal LC system. Furthermore, identification of PTMs at nano-gram/singlecell level without any PTM enrichment was achieved by directly searching the nano-gram level of global DIA data against pre-generated PTM libraries. More PTM sites may be identified if the PTM libraries constructed by PTM enrichments from the same samples are used to search the global proteomic data [39]. Additionally, we were able to detect the cellular heterogeneity between PACCs and their parental MDA-MB-231 cells at single-cell level using our established workflow. In summary, we developed a novel approach to study small cell population, including single cell, by using DIA-MS coupled with a short LC gradient and the directDIA search with an internal library with an appropriate size, which can support quantitative single-cell proteomic and PTM analyses at high-throughput.